-
Notifications
You must be signed in to change notification settings - Fork 465
Add Identifier wrapper that strips backticks from token text #2576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Identifier wrapper that strips backticks from token text #2576
Conversation
581dea4 to
7d0faac
Compare
ahoppen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting up the PR 👍🏽
Sources/SwiftSyntax/Identifier.swift
Outdated
| /// An abstraction for sanitized values on a token. | ||
| public struct Identifier: Equatable, Sendable { | ||
| /// The sanitized `text` of a token. | ||
| public let name: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to store the name as pure bytes instead of a String. That way the identifier actually represents the raw byte name of the identifier in the source file and doesn’t do any unicode normalization.
The best option would probably to use SyntaxText (TokenSyntax.rawText) but SyntaxText doesn’t own the underlying text buffer, so Identifier would also need to keep the SyntaxArena it was allocated in alive (RetainedSyntaxArena).
CC @rintaro In case you have opinions / thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just chatted to @rintaro about this.
What we want is two types:
- An
@_spi(RawSyntax) public struct RawIdentifierthat stores the name of the identifier without keeping theSyntaxArenaalive. This is intended to be used in performance-critical situations that can guarantee that theSyntaxArenastays alive and don’t want to pay the ref-counting overhead for it. - A
public struct Identifierthat wrapsRawIdentifierand also keeps the syntax arena alive usingRetainedSyntaxArena. Identifier should then have a computed propertyname: Stringthat returns a Unicode-normalized version of the identifier.Stringis probably what most clients choose to use but for uses in the compiler, we need to be byte accurate because the compiler considersU+00E0(à LATIN SMALL LETTER A WITH GRAVE) to be different thanU+0061 U+0300(a LATIN SMALL LETTER A followed by ̀ COMBINING GRAVE ACCENT) while Swift’sStringperforms Unicode normalization and considers them the same (print("\u{e0}" == "\u{61}\u{300}")printstrue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ahoppen thanks for following up on this!
I'm looking in to the above and wanted to clarify whether the RawIdentifier.name needs to be a SyntaxText or a RawSyntax
I've pushed a WIP commit which covers the above and uses a SyntaxText which mostly works (except for the test testRawIdentifier() which I'll look in to after the conclusion of this conversation)
However on attempting to use RawSyntax type I'm getting a bit lost in the code from me being so new to this codebase.
Are you able to:
- clarify if we want
RawIdentifier.nameto be aSyntaxTextor aRawSyntax - look at my latest WIP commit and provide some input on my attempts so far?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawIdentifier.name should be a SyntaxText. Sorry if that wasn’t clear from my last comment.
ac33a8c to
3b4820e
Compare
Added a new Identifier type which contains a name property This name property contains a sanitized version of the TokenSyntax's text property which for now only consists of trimming backticks
This acts as a convenience property to convert a TokenSyntax to an Identifier
This isn't explicitly needed for this problem but it seems as though the default for all types that can conform to Sendable/Hashable should This also sets up this new type for Swift 6.0 (and the current Swift 5.10) to pass this type around safely when needed As well as allows Identifier to be a key in a dictionary which could be a common scenario
3b4820e to
336b4b3
Compare
When trying to create an Identifier from a non-identifier token, the initializer should fail, returning nil Additionally the identifier property of the TokenSyntax should also return nil
ahoppen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is going in a great direction. It’s about the design that I had in mind.
Test functions with backticks in their identifier should have them filtered out. This is a stopgap until swiftlang/swift-syntax#2576 is ready.
|
@adammcarter Are you planning on coming back to this? Do you mind if I put up a patch that addresses these comments? |
|
@adammcarter @ahoppen I took a shot at addressing the review comments. |
ahoppen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up @plemarquand! I’ve got a few comments inline. Could you also add an entry to the release notes?
|
@swift-ci please test |
ahoppen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Looks great!
|
@swift-ci Please test |
|
@swift-ci Please test Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add Identifier.swift to CMakeLists.txt. swift-syntax is built using CMake when it’s linked into the compiler.
|
@swift-ci Please test |
1 similar comment
|
@swift-ci Please test |
Motivation
Following the issue opened in #1936, take the code below as an example:
When wanting to pull out the character
you will have to manually strip the backslashes to get the
zcharacter.The backslashes are completely valid in this case but often using the Swift syntax you might want to remove the backslashes to get the sanitised name of the token for your own purposes which can introduce unnecessary boilerplate.
Solution
Thinking forward, this new
Identifiertype allows an abstraction over tokens to further sanitise/strip data out, for example, backticks (as in this PR) or future updates like:é#symbols such as in#if/*,///or//With the changes in this PR we can take the same example code:
And by calling
[token].identifier.namewe can getzwithout the backticks.Alternatives considered
Stripping the backslashes when returning the
.textis one solution, but the backticks are a valid part of the text and so wouldn't be an accurate way of returning the source codeTrimming the backticks as part of
trimmedDescriptionis a valid and simpler solution IMO, but doesn't allow for future implementations like the above under theIdentifierabstraction